Xgen Mm Vid Phi3 Mini R V1.5 32tokens 8frames
xGen-MM-Vid (BLIP-3-Video) is an efficient and compact vision-language model equipped with an explicit temporal encoder, specifically designed to understand video content.
Video-to-Text
Safetensors English